Audio–Visual Segmentation
نویسندگان
چکیده
We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is output pixel-level map of object(s) that produce sound at time image frame. To facilitate this research, we construct first benchmark (AVSBench), providing pixel-wise annotations for sounding objects audible videos. Two settings are studied with benchmark: 1) semi-supervised single source and 2) fully-supervised multiple sources. deal AVS problem, method uses temporal interaction module inject audio semantics as guidance visual process. also design regularization loss encourage mapping during training. Quantitative qualitative experiments on AVSBench compare our approach several existing methods from related tasks, demonstrating proposed promising building bridge between semantics. Code available https://github.com/OpenNLPLab/AVSBench .
منابع مشابه
Application of Topic Segmentation in Audiovisual Information Retrieval
Segmentation into topically coherent segments is one of the crucial points in information retrieval (IR). Suitable segmentation may improve the results of IR system and help users to find relevant passages faster. Segmentation is especially important in audiovisual recordings, in which the navigation is difficult. We present several methods used for topic segmentation, based on textual, audio a...
متن کاملAudio content analysis for online audiovisual data segmentation and classification
While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based on audio content analysis is proposed. The audio signal from movies or TV programs is segmented and class...
متن کاملMaximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation
In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM) approach developed by the authors is used for automatic lip tracking, and an adapted version of our vowel based speech segmentation system is employed to automati...
متن کاملPlundermatics: Real-time Interactive Media Segmentation for Audiovisual Analysis, Composition and Performance
This paper presents methods for real-time automated media segmentation, interactive audiovisual analysis, and media search in composition and performance tasks. In addition, we detail a use case where these tools have been deployed successfully as part of high profile public, national broadcast events, installations and exhibitions. These tools utilise a combination of data-mining and informati...
متن کاملSegmentation and Annotation of Audiovisual Recordings Based on Automated Speech Recognition
Searching multimedia data in particular audiovisual data is still a challenging task to fulfill. The number of digital video recordings has increased dramatically as recording technology has become more affordable and network infrastructure has become easy enough to provide download and streaming solutions. But, the accessibility and traceability of its content for further use is still rather l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-19836-6_22